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^^ Starting from the observation of an R"-Gaussian vector of mean 

/ and covariance matrix a^In (7„ is the identity matrix), we pro- 
pose a method for building a Euchdean confidence ball around /, 
with prescribed probability of coverage. For each n, we describe its 
nonasymptotic property and show its optimality with respect to some 
t-H , criteria. 

r^ ' 1. Introduction. In the present paper, we consider the statistical model 

t^ ■ 

(1) Yi = fi + aei, i = l,...,n, 

where / = (/i, • • • ,/n.)' is an unknown vector, a a positive number and 
£i,. . . jEn a sequence of i.i.d. standard Gaussian random variables. For some 
J^ . /3 G ]0, 1[, the aim of this paper is to build a nonasymptotic Euclidean confi- 

fvq I dence ball for / with probability of coverage 1 — /? from the observation of 

^; y = (yi,...,y„y. 

This statistical model includes, as a particular case, the functional regres- 
sion model 



o 
o 



(2) Yi = F{xi) + aei, i = l,...,n, 



d . where F is an unknown function on some interval, say [0,1], and the x/s 

H ! are some distinct deterministic points in this interval. The literature on the 

topic usually deals with this particular model, which offers the advantage of 

focusing on the quantity F, which does not depend on n. This simplifies the 

r> I asymptotic point of view. For this reason, we shall focus in this Introduction 

C^ ■ on the problem of building a confidence ball for F. In the sequel, we denote 

by II • ||„ the seminorm defined on the set of real- valued functions t on [0, 1] 
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2 Y. BARAUD 

The problem of building a confidence ball for F with respect to || • ||n 
easily reduces to that of building a Euclidean confidence ball for the vec- 
tor / = (F(xi), . . . , F{xn))' by identifying the functions t on [0, 1] with the 
M'^-vectors (i(a;i), . . . ,t(a;„))'. Thus, when cr^ is known, say equal to 1, the 
problem is solved by considering the Euclidean ball centered at Y with 
squared radius qo,n{P), where go,n(/3) denotes the (1 — /3)-quantile of a x^- 
distribution with n degrees of freedom. However, such a confidence ball is 
almost useless: besides providing a very rough estimator of F, the radius 
of the confidence ball is very large. To overcome this problem, a natural 
idea is to start with a "good" estimator of F, say Fn, and then to esti- 
mate 6n{F) = \\F — Fn\\n by some suitable estimator, say (5„. This is the 
key point of the procedures proposed by Li (1989), Beran (1996) and Beran 
and Diimbgen (1998). In the last two papers, the estimators F„ and (5„ are 
such that ^/n{6n{F) — 5„) converges to some limit distribution Q as n be- 
comes large. Thus, if one denotes by Q~^(l — P) the (1 — /3)-quantile of Q, 
the ball centered at F„ of squared radius 6n + (5^^(1 — (3)/^/n provides a 
confidence region with asymptotic probability of coverage 1 — (3. The limit 
distributions Q obtained in Beran (1996) and Beran and Diimbgen (1998) 
are both Gaussian of mean 0. However, their variances depend on F and 
a and, consequently, Q~^{1 — (3) must be estimated in turn from the data. 
The disadvantage of the procedures proposed in Beran (1996) and Beran 
and Diimbgen (1998) mainly lies in their asymptotic character. It is indeed 
difficult to judge whether the asymptotic regime is achieved or not as it 
depends on the features of the unknown function F. 

In contrast, the asymptotic confidence balls proposed by Li (1989) are 
called honest in the sense that the probability of coverage is uniform with 
respect to all possible functions F. However, in Li (1989) the variance of the 
errors is assumed to be known and the radius of the confidence ball involves 
an inexplicit constant. His procedure is based on a Stein estimator of F, F„, 
and a Stein estimator of ||F — -Fnlln- ^ comparison between Li's confidence 
balls and ours will be given in Section 2.3. 

Another direction was investigated by Cox (1993). He considered Bayesian 
inference for a class of regression models. The regression functions F were 
drawn under a Gaussian prior distribution among the solutions of a high- 
order stochastic differential equation. He analyzed the L^([0, 1], d3;)-distance 
between F and its estimator F (the posterior expectation of F) and deduced 
a confidence ball for F. He proved that if n is fixed (large enough) the 
frequentist probability of coverage of the confidence ball is close to 1 for all 
F within a set of probability close to 1. However, this probability of coverage 
is infinitely often less than any positive e as n tends to infinity for almost 
all F. Unfortunately, this negative result on Cox's confidence ball makes it 
unattractive for non-Bayesians. 
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The ideas underlying our approach are due to Lepski and have been ex- 
posed by their initiator in a series of lectures at the Institute Henri Poincare 
in Paris. We shall now give a brief account of these ideas and recommend 
that the reader have a look at Lepski (1999) for more details. Lepski noted 
that if F is known to belong to a suitable class S of smooth functions, then 
the minimax approach allows one to obtain both an estimator of F and a 
control on the accuracy of the estimation. However, unless one has a strong 
guess on the particular features of F, S is usually too large to obtain an 
accurate estimation. The idea of Lepski is to test one or several additional 
structures on F in order to improve the accuracy of estimation. Unlike an 
adaptive approach, an attractive feature of Lepski's approach lies in that the 
accuracy is available to the statistician and, consequently, that a nonpar a- 
metric confidence ball for F can be derived. This is explained in the papers 
by Lepski (1999) and by Hoffmann and Lepski (2002). However, the proce- 
dure described there for the purpose of building L^-confidence balls suffers 
from the following weaknesses. First, the point of view is purely asymptotic. 
The procedure does not lead to confidence balls with prescribed probability 
of coverage for fixed values of n. Furthermore, a careful look at the proofs 
shows that, for a fixed n, the squared radius of the confidence ball is equal to 
a constant plus some term which is essentially proportional to the number 
of hypotheses to test. Consequently, the number of these cannot be large 
if one wants to keep the confidence ball of a reasonable size. In addition, 
the squared radius of the confidence ball is proportional to 1//3 and is thus 
very large for small values of /?. Finally, the applications developed in Lep- 
ski (1999) and Hoffmann and Lepski (2002) mainly address the Gaussian 
white noise model and an adaptation of the procedure to the regression case 
would require an estimation of the unknown a. 

The results of the present paper are nonasymptotic and the procedures 
which are described here aim at obtaining confidence balls which are as 
sharp as possible. In particular, the dependency with respect to [3 and the 
number of hypotheses to test is only logarithmic. This allows us to handle 
the variable selection problem described in Section 2.4. 

We consider the case where a is known to belong to some interval 
/ = [(1 — ?7)r^,r^] with r/ > 0. The situation ry = corresponds to the the- 
oretical situation where one exactly knows the variance. In contrast, the 
situation r] > corresponds to the practical one when the variance is known 
to belong to some interval which is either derived by the experimental con- 
text or by statistical estimation (from an independent sample). In all cases, 
the optimality (in a suitable sense) of our confidence balls is established. 
The proof relies on nonasymptotic lower bounds for the minimax estimation 
and separation rates over linear spaces. We show that if a confidence ball en- 
sures the probability of coverage 1 — /3 uniformly over all / G M" and cr^ G /, 
then its radius (normalized by y/n) must be greater than Cmaxjy^, n~^'^}. 
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where C is a constant free from n and r]. When r] = 0, this result ahows one 
to recover that estabhshed by Li (1989), namely that asymptotically the 
radius of such a confidence ball cannot converge toward faster than n~^'^. 
When r] > 0, this result shows that practically the problem of establishing 
useful confidence balls is impossible unless rj is small compared to n. 

The paper is organized as follows. In Section 2 we consider the case of 
a known a{r] = 0) and describe a procedure free from any prior assumption 
on /. This procedure is implemented on numerical examples in Section 4. 
In Section 3, we consider the case rj > and provide some lower bounds on 
the radius of an honest confidence ball. We show in this section that these 
lower bounds are sharp by providing a construction of confidence balls which 
achieves these bounds. The proofs are postponed to Section 5. 

Notation. Throughout this paper we use the following notation. We 
denote by || • || the Euclidean distance in M". For a triplet (z, d, u) G M+ x N\ 
{0}x]0, 1[, we denote by x1 d(') ^^^ distribution function of a (non)central 
X^ with noncentrality parameter z and d degrees of freedom and by qz,diu) 
its (1 — ii)-quantile for u G ]0, 1[. In particular, if X is distributed as x1 di')' 
then 

E,[X]=z + d, and F{X > q.^diu)) = u VuG]0,1[. 

We will use the convention Qzfiiu) = for all u g]0, 1[ and z > 0. For each 
linear subspace S of R", we denote by n^ the orthogonal projector onto S 
and by ;S(a;, r) the Euclidean ball centered at x £ M" of radius r > 0. Finally, 
C, C", . . . denote constants that may vary from line to line. 

2. Confidence balls when the variance is known. The aim of this section 
is twofold: first, explain the basic ideas of our approach and second, in the 
ideal case where the variance a^ is known, build a confidence ball for / with 
controlled probability of coverage. 

2.1. The basic ideas. An ideal procedure to build a confidence ball would 
probably be to start with a nice estimator of /, say /, and then get a uni- 
form control of ||/ — /|| over all possible /. This strategy is unfortunately 
impossible in general. For illustration, let us consider / = Il^y, the pro- 
jection estimator of / onto a linear subspace S of M" of dimension T> <n. 
By setting z equal to the squared Euclidean distance between / and S and 
using Pythagoras' theorem, we derive that 

\\f-ff = z+\\Usefa' 

and, hence, a control of ||/ — /|p necessarily requires that an upper bound 
on z be known. This is of course seldom the case in practice. The idea of 
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our procedure is to get such a piece of information by means of a test. More 
precisely, let us fix some a g]0, 1 — /3[ and consider the x^-test of level a of 
hypothesis "/ E S"' against "/ G M*^ \ 5" which consists in rejecting the null 
when the test statistic T = ||y — n^ylp is greater than (7o,n-©(tt)o"^- If the 
test accepts the null, then intuitively this means that / is close to S and, 
therefore, that z is small. The following lemma shows that ||/ — /|| cannot 
be large on the event that the hypothesis "/ G S"' is accepted. 

Lemma 2.1. Let a G ]0, 1 — /3[. Let us define 
(3) (P(Y) = 1{\\Y - UsYf > qo,nMa)a^} 

and 



LfV^O, we set 

(4) p2 ^ g^p 



z£Z 



/3 

z + qo,T) [ -^ 



<y^ 



ifD = 0, we set 

(5) P^ = m{{z > 0, x',n('?o,n(a)) < PW- 
Then, for all f gW, 

(6) P^,,[0(y) = O,||/-/||>p]</3. 

Let us assume that cr = 1 and make a few comments on the set Z and the 
quantity p. The inequality a <1 — (3 implies that belongs to Z and, hence, 
the set Z is always nonvoid. Moreover, since the map ^p■. z*-^ x1 n-vi'io,n-v{ci)) 
is decreasing, continuous and tends to as z becomes large, it appears that 
Z is an interval of the form [0, z[, where z satisfies V'(^) = P- When 2? = we 
deduce that p^ = z and, consequently, that p is finite. Since qo,v{u) tends to 
as ti approaches 1 from below, we see that p'^ is also finite when V ^0. The 
supremum in (4) is usually achieved at some point z* G Z. If the squared 
Euclidean distance between / and S equals z* , then equality holds in (6). 
The quantity z* is a critical value for the (squared) distance z between / 
and 5: if z is large compared to z* , then the test (j) rejects the null with 
probability close to 1 and thus the left-hand side of (6) is small. This is also 
the case if, on the other hand, z is small compared to z* because then / is 
a "good" estimator of / and the event ||/ — /|| > p seldom occurs. 

The convention 

(7) go,D(l) = -oo 
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allows one to define the quantity p equivalently as 
(8) p2^sup 



2>0 



/5 
z + qo,v — 7 i-yT A 1 



<j\ 



In the sequel, we shall use this convention to simplify our notation. 

Our procedure for building a confidence ball around / is based on Lemma 2.1. 
As a control of ||/ — /|| is possible when the hypothesis "/ € 5" is accepted, 
we increase our chance to accept such hypotheses by considering a family of 
5's rather than a single one. Moreover, in order to ensure that, for at least 
one S the hypothesis "/ € 5" is accepted, we add the linear space S = W^ 
to the family, the hypothesis "/ G M*^" being obviously true. 

2.2. Construction of the confidence hall. Let {Sm^fn G Mn} be a finite 
family of linear subspaces of M". For each m, we set Vm = dim(5m), Nm = 
n — Vrn and associate with Sm some number /3m in ]0, 1[. We assume that 
the following assumption is fulfilled. 

Assumption 2.1. The subscript n belongs to Mn and Sn = M". We 
have EmG.M„ Pm < P- 

For each m £ Mn, we define pm as follows. If m = n, then 

If 771 G J^n \ {n} and Vm. / 0, then pm is defined by (8) with Vm in place 
of V and (3m in place of /?. If m G A^„ \ {?i} and Vm = 0, then p^ is defined 
by (5) with (3m in place of (3. 

For each m G Aln \ {n}, we define /m = Hg^y and 0m is the test defined 
by (3) with S = Sm-lirn = n, then /„ = F and (j)n{y) = for all y G M''. 

We define 

A = {meMn,(l)miY)=0} 
and 

(9) m = argminpm, P = Pm, f = frh- 
We have the following result. 

Theorem 2.1. Let {f,p) be the pair of random variables defined by (9). 
The region B{f,p) is a confidence hall with probability of coverage 1 — (3, 
that is, 

(10) Pj,,[/Ge(/,p)]>l-/3 V/GM". 
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Moreover, for each m£ Mn o-nd f € M", if for some 7 G ]0, 1[ we have 

(11) P/,,[</.^(y) = 0]>l-7 thenFf^„[p<pn,]>l-j. 
In particular, for all m £ Ain, 

(12) inf P;,,[p<p™]>l-a. 

Let us make a few comnients: 

1. Inequalities (11) and (12) are clear from the definition of p since with 
probability not less than 1 — 7 (resp. 1 — a) we have m £ A. Inequality (12) 
provides an upper bound (in probability) for the random variable p under 
the law Pj o- as soon as / G Sm- Inequality (11) says that this upper bound 
remains valid not only when / belongs to Sm but also when / is close to 
5m, as then the test (pm still accepts the hypothesis "/ G 5m" with large 
probability. 

2. Note that A is nonvoid since n belongs to A. The case where p = pn 
corresponds to the one where none of the hypotheses "/ G 5^" (with 
m G Mn \ {n}) is accepted. In this case, the resulting confidence ball is 
crude, namely centered at Y of radius pn- Note that when (3n is chosen to 
be of order /3, say (5/2, the radius p^ is of the same order as p^ = qo^n{0)(^'^, 
which means that the procedure does not lose too much compared with 
the trivial confidence ball I3{Y,p). 

3. In the proofs we show something stronger than Theorem 2.1. Namely, 
we prove that, with probability not less than 1 — (3, f belongs to the 
intersection of the Euclidean balls B{fm,Pm) for m £ A. However, the 
resulting confidence region is no longer a ball in general. 

The expressions of the quantities pm do not allow a direct appreciation of 
their orders of magnitude. An upper bound for pm is given in the following 
proposition. We restrict ourselves to the case where the dimension of Sm is 
not larger than n/2. Indeed, considering linear spaces with dimension larger 
than n/2 leads to large radii and thus does not offer a real gain compared 
to M". The proof of the following proposition contains explicit constants. 

Proposition 2.1. Assume that, for all m£ Mn\{n},Vm< n/2. Then 
there exists some constant C depending on a only such that, for all m G Mn, 

Pm < Cmax{Pm,Vnlog(l//?m),log(l//3m,)}cr^. 

If Ain reduces to {n}, then p = Pn and the radius of the ball is of order 
no"^ by taking /3„ = (3. By considering several linear spaces Sm we have 
the opportunity to capture some specific features of / and consequently to 
reduce the order of magnitude of p. The number of tests \Mn\ to perform 
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is taken into account via the quantity (3m- If one chooses (3m = P/\Mn\ 
for all m E A4n, one gets that the radius of the confidence ball depends 
logarithmically on |A^n|- However, a choice of Pm depending on m via the 
dimension of the linear space Sm, for example, is recommended. We shall 
see an example in Section 2.4. 

2.3. Comparison with the procedure proposed by Li. In this section, we 
make a comparison between our procedure and that proposed by Li. To 
simplify the discussion we assume that a^ = 1. Li's procedure relies on a Stein 
estimator of /, say /*, and a Stein estimator of ||/ — /*|p. The estimator /* 
is obtained by modifying a linear estimator of /, say /. By taking / = HsY , 
where S is a linear subspace of M" of dimension D <n, the confidence ball 
Li proposes is centered at 

and its squared radius is given by 



where c is an unspecified constant depending on (3 and o"^ only. He proved 
this confidence ball has probability of coverage 1 — /3 for all / S M" simulta- 
neously provided that n is large enough. To compare this confidence ball to 
ours, let us make the a posteriori assumption that / belongs to S. On the one 
hand, by using our procedure with 7W„ = {m,n}, Sm = S, (3m = (3/2 = /3„, 
we derive from Theorem 2.1 that, with probability close to 1, p^ = Pm, which 
is of order max{-y/n,2?}. On the other hand, replacing \\Y — H^yp by its 
expectation n — D shows that the squared radius of Li's confidence ball is 
of order 

1 I— ( ^ n — VX I— ^ 

r f« C\Jn + n 1 = C\Jn + u 

\ n ) 

and is therefore of the same order as ours. 

However, for those / which do not belong to S the radius of Li's confidence 
ball can become large. The advantage of our approach lies in that it is 
possible to deal with a larger family of spaces than just {S*, M"}. By doing 
so, we can keep the radius of the confidence ball to a reasonable size for 
those vectors / which are close to at least one of the linear spaces of the 
family and not only S. 

2.4. Application to variable selection. In this section, we illustrate the 
procedure in the variable selection problem. Assume that / is of the form 
XU, where X is a known p x n full-rank matrix with p£ {1, . . . , n} and U 
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some unknown vector in MP. The problem of variable selection is to determine 
from the data the nonzero coordinates of U, that is, 



m 



{je{i,...,p},Uj^o}. 



In this section we give a way to select those coefficients and provide simul- 
taneously a confidence ball for /. We apply the procedure as follows: 

Let xi, . . . ,Xp be the column vectors of the matrix X and let Vn be the 
class of nonempty subsets m of {l,...,p} with cardinality \m\ not larger 
than n/2. For all m G Vn, we define Sm as the linear span of the Xj's for 
j £ 771 and set 



Pm=P 



n 



with T) =\m\ 



We define A^„ = Vn U {n} and set /?„ = /3/2. Note that Assumption 2.1 is 
fulfilled since 

E /5™ = f+ E /?"^ = f+ E E /^"^</?- 

mSA^n me-Pn l<X'<n/2 m£Vn,\m\=V 

By applying the procedure described in Section 2.2 we select a set of in- 
dices m for which the Euclidean distance between the least-squares estima- 
tor fra and / is not greater than pm with probability greater than 1 — /?. 
Since / belongs to the linear space Sm* , with probability greater than 1 — a 
the set m* belongs to A and consequently pm is not greater than pm* ■ There- 
fore, either fh = m* and then the procedure selects the target subset m* , or 
m 7^ m* and then the resulting confidence ball is at least as accurate as if 
the target subset m* were selected. In addition, thanks to the inequality 

^)<exp(Plog(en/P)) 

and Proposition 2.1, with probability greater than 1 — a, the following upper 
bound holds: there exists some constant C depending on a and /3 only such 
that 

p^ < Cmax{Vn|m*| log(en/|7Ti*|), |m*| log(en/|7n*|)}(T^. 

Let us denote this upper bound by B. Another possible choice of the f3m's 
is Pm = Pn = P/\-M.n\ for all m G A^„. For this second strategy, p'^ is of order 
B' = max{y^np,p}(T^ as |A^n| is of order 2^. In the least favorable situation 
where almost all the coefficients Uj^s are nonzero, |m.*|, p and n are of the 
same order and, thus so are B and B' . In this case, both strategies lead to 
confidence balls which are approximately of the same size. Yet, in the more 
favorable situation where p is still of order n but \m*\ is small compared 
to p, the strategy with nonconstant /3m, 's leads to a sharper confidence ball. 
This illustrates the advantage of taking (3ni as a function of m. 
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3. Confidence balls under some information on the variance. In this sec- 
tion, we no longer assume that a is known but rather that it belongs to some 
known interval /= [-^1 — r]T,T], where {t'^,'I]) S ]R_(_ x [0,1[. As we shall see, 
the uncertainty on the value of a has a terrible effect on the orders of mag- 
nitude of radii of confidence balls. 

3.1. How sharp can the confidence ball be? We have the following result. 

Theorem 3.1. Let a and (3 be numbers in ]0, 1[ satisfying 2f3 + a < 
1 — exp(— 1/36). Let {f,r) be a pair of random variables depending on Y 
only with values in M"^ x M4. satisfying, for all f G M" and a £ I , 

(13) P/,.[/G^(/,f)]>l-/3. 

For each m £ Ain, 1st r^ be some positive quantity satisfying for all a £ L 

(14) inf P/,,[f<r^]>l-a. 

Then there exists some constant C depending on a and (3 only such that, 
for all m G Mn, 

(15) rl > Cmax{7]Nrn,Vm, ^/N^^}T^. 
For each f £ M" let r(a, f) be such that, for all a £ I, 

IP/,<x{r<r(a,/)}>l-a. 
Then we have 

(16) r (q,/) > Cmaxjryn, "v/nlr . 

To keep our formula as legible as possible, the above theorem involves an 
inexplicit constant C. However, lower bounds including explicit numerical 
constants are available from the proof in Section 5.3. 

Let us make few comments. 

1. From an asymptotic point of view, (16) allows one to recover the result 
established by Li, namely that the radius of an honest confidence ball 
(normalized by -v/n) cannot converge toward faster than n~^'^. We also 
get that the thus normalized radius converges towards only if r/ = r]{n) 
does and then the rate cannot be better than max{-\/r/(n),n~^'^}. 

2. When ry = and T>m < n/2 we derive from (15) that 



.2 

m 



r:; > CmaxjDm, \'n}a 



for some constant C depending on a and (3 only. This lower bound is of 
the same order as the upper bound on p'^ established in Proposition 2.1 
provided that Pm is free from n. This is the case if Pm = P/\-Mn\ and if the 
cardinality of the collection, |A^n.|) does not depend on n. The procedure 
is then optimal in the sense given by Lepski (1999). 
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A natural idea to establish a confidence ball around / when the true 
variance is unknown is to use the construction of the previous section and 
to replace the variance a by the upper bound r, this latter quantity be- 
ing connected "intuitively" to the least favorable situation where the level 
of the noise is maximal. Unfortunately, Theorem 3.1 says that such a con- 
struction cannot lead to a confidence ball as changing a into r would only 
affect the order of magnitude of the radius by a factor t/ct, which would 
be contradictory with (16). In the next section, we show how to modify our 
previous construction (with a known a) in view of obtaining a confidence 
ball whatever the values of / and a £ I. 

3.2. Construction of a confidence ball. In this section we build a confi- 
dence ball under the information that a belongs to /. 
The following result holds. 

Theorem 3.2. Let a £ I and assume that Assumption 2.1 is fulfilled. 
Consider the construction of{f,p) described in Section 2.2 with the following 
definitions for the Pm.^s and A: if m = n, then 

if m £ M-n \ {n} and Vm / 0, 



pL = sup 

z>0,crG/ 



ZCr + qo,Vm -2 -, / N 9// 9^^ -^ 1 K 



xl^„('?o,iV„(a)rV(^^)) 



if m £ Ain \ {n} and Vm = 0, 

Pm = inf i x>0, sup xl/^2 „(go,n(a)TVo"^) < Pn 

and 

A = {m£ Mn, \\Y - fmf < go,JV™.(a)T^}- 

The region B{f,p) is a confidence ball with probability of coverage 1 — (3; 
that is, (10) is satisfied. Moreover, for each m £ Mn, 

(17) inf P/,,[p<p„,]>l-a. 

An upper bound for pm is given by the following proposition. 

Proposition 3.1. Assume that, for allm £ Mn\{'n}, V^ < n/2. There 
exists some constant C depending on a only such that, for all m £ Mn, 



pin < Cmax{r7n,P^,\/nlog(l//3m),log(l//3„0}r2. 
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From an asymptotic point of view, we derive from Theorem 3.1 the opti- 
mahty of the procedure whenever the cardinahty of the collection \A4n\ does 
not depend on n by taking Pm = P/\-Mn\ for all m £ M.n- For more general 
collections, the procedure is also optimal for those m G Adn for which (3m 
does not decrease with n. 

4. Illustrative numerical examples. In this section we apply our proce- 
dure in three examples. In the sequel, the number of observations is n = 1000. 
We choose (3 = 10% and a = 20%. The e^'s are standard i.i.d. Gaussian ran- 
dom variables and we assume that the variance is known, that is, u^ = 1. We 
set Xi = i/n for i = 1, . . . ,n and define the vector / as (F(xi), . . . ,F{xn)y , 
where F is one of the following functions on [0, 1] : 

Fi{x) = cos(27rx), 

F2(x) = cos(27rx) + 0.3sin(207ra;), 

{1.5, if0<a;<0.3, 

0.5, if0.3<x<0.6, 

2, if 0.6 < X < 0.8, 

0, else. 

For each function F G {^1,^2,^3}, Figure 1 shows F with one set of 
simulated data. 

For each m> 1, we define J-m. as the linear span generated by the con- 
stant function on [0, 1], c^o = 1) together with the sine and cosine functions 
cos(27rjx),sin(27rjx) for j = 1, . . . ,m. For each TTi > 1, we define Sm as the 
linear space 

Sm = {{F{xi),...,F{Xrr)y,FeJ'm}. 

We take 

Mn = {2\k = l,...,Kn}U{n}, 

with Kn = 8. The number Kn is chosen such that dim(S2Kn ) <n. We choose 
/?„ = /32-^" and for each k = 1, . . . ,Kn, ^2'^ = 132-^. 

We made 100 simulations. For each simulation and each function 
F G {Fi,F2,F3} we consider m{F), the smallest integer m G J^n such that 
the hypothesis "/ G S^" is accepted. In Table 1 we have displayed for each 
F and m G Ain the number of simulations for which m{F) = m. 

Let us now comment on Table 1. Note that the radii pmS are increasing 
with T>m- This comes from our choices of PmS, which are more favorable 
to linear spaces with small dimensions. Thus, the smaller is the dimension 
Sm, the sharper is the radius of the confidence ball when the hypothesis 
"/ G 5'm" is accepted. 
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Fig. 1. 



Table 1 



Indices 


Dimensions 


Squared radii 


U 


feSm 


•}•} 


m 


-Drr. 


P?n./n 


Fi 


F-2 


Fa 


2 


5 


0.118 


82 


47 





4 


9 


0.136 


1 





8 


8 


17 


0.155 





1 


20 


16 


33 


0.181 


1 


33 


28 


32 


65 


0.222 


1 


3 


17 


64 


129 


0.293 


4 


5 


6 


128 


257 


0.425 


1 


1 


7 


256 


513 


0.681 


4 


4 


5 


1000 


1000 


1.157 


6 


6 


9 
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Function Fi belongs to ^2- As expected, the hypothesis "/ G S'2" is ac- 
cepted for around 80 simulations, a = 20%. This choice of a is arbitrary. 
By taking a smaller, the hypothesis will be accepted more often but on the 
other hand the radius of the confidence ball will be larger. For example, the 
value oi p'2/n, respectively, equals 0.149 and 0.160 for a = 15% and a = 10%. 

Function F2 is a perturbation of Fi. The test "/ £ S'2" is accepted for 
47 simulations even though F2 does not belong to T2 but .T^ig. However, for 
these 47 simulations the procedure has taken advantage of the closeness 
of F2 to J-2 to provide a sharper confidence ball than the one we would 
obtain if m[F2) were equal to 16. We emphasize that the procedure provides 
a confidence ball with probability of coverage 90% even though the "right" 
model for F2 (namely J-iq) is accepted for only 33 simulations. This comes 
from the fact that the radius of the confidence ball takes into account a 
possible bias between the true and the linear space accepted by the test. 
Finally note that, as expected from Theorem 2.1, the radius of the confidence 
ball exceeds Piq/u for 19 simulations since F2 belongs to J^iq. 

Function F3 was considered in Beran and Diimbgen (1998) in one sim- 
ulated example. In their simulation, the squared radius (with respect to 
II ■ ll/V^) of the confidence ball was obtained by bootstrap and was equal 
to 0.144. We obtain a radius of the same order for 28 = 8 + 20 simulations. 

5. Proofs. Throughout the proofs we repeatedly use the following in- 
equalities on the quantiles of noncentral x^ random variables. These in- 
equalities are due to Birge (2001). For all n G ]0, 1[, z>0, d>l, 

(18) qzA^) <z + d + 2V{2z + d) log(l/n) + 21og(l/n), 

(19) q,^d{l -u)>z + d- 2V{2z + d) log(l/u). 

In the sequel, 11^ for m E A^„ denotes the orthogonal projector onto Sm- 

5.1. Proof of Lemma 2.1. For simplicity, let us take a^ = 1. 
If P = 0, then / = UsY = 0, and hence 

(20) P/,i[0(y) = 0, 11/ - f\\ >p] = P/,i[||y f < go,n(a), 11/11 > p]. 

If 11/11 < p this probability equals 0. Otherwise, ||/|| > p. Since ||l^|p is dis- 
tributed as a x^ with noncentrality parameter ||/|p and n degrees of free- 
dom, it follows from the definition of p that the right-hand side of (20) is 
not larger than f3. 

Now let P / 0. For all / G W'\ note that \\Use\\^ and \\Y - UsY\\^ = 
11/ — lis/ + e — n^ep are independent random variables. By setting z = 
11/ -n^/f, we deduce 

p^,i[<^(y) = o,||/-/||>p] 
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= P/,i[||y - UsY\\^ < qo,nMa), 11/ - ncj/f + \\Usef > p2] 

= xl,n-viQo,n-v{a)){l - xl,v{P^ - z)). 

If X2 n-X'(^o,n-x>(«)) < /?, then the result is estabhshed. Otherwise z ^ Z 
and, by definition of p, 

(3 



p - z> qo^v — 



which leads to 



(i-4^(p -^))< 



Xin-vi(l0,n-via)) 

P 



Xz,n-vilO,n-v{a)) 



and the result follows. 



5.2. Proof of Theorems 2.1 and 3.2. Theorem 2.1 being a straightfor- 
ward consequence of Theorem 3.2 by taking rj = 0, we only prove Theo- 
rem 3.2. 

Let us first prove (17). The result is clear for in = n as by definition p< pn- 
Let us fix some m £ Mn \ {^j- We derive from the definition of p that 

= ¥fA\\Y - Lf > qo,N^{ay] 
<ffA\\Y - Lf > qo,N^[a)a\ 

as T > a. We conclude by noting that, for / G 5m, \\y — /mlP/o"^ is dis- 
tributed as a x^ with Nm degrees of freedom. 

We shall now show something that is stronger than (10), namely that 



/,- 



/^ n ^ifm,Pn 

meA 



<P. 



For all / G : 



/,- 



/^ fl B{fm,Prr, 
meA 

= FfA3m£A,\\f-L\\>Pm] 

< > FfJ\\f-f„,\\>Pm,meA] 



meMn 
meMn 



/mil > Pm., \\Y - fm\\^ < qo,Nr,A<^W 
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Since J2m<^M„(^in = P, it is enough to prove that, for each m G A^„, the 
probabihty 

P/,,(?n) = Py,,[||/ - fmW > p„„ \\Y - fmf < qo,Nj(yy] 

is not greater than /3m- 

If m = n, this is clear since Y = fn and, for r^ > o"^, 

P/,,(n) =P/,,[a2||e||2 > go,„(/3n)T2] </3n. 

Let us now prove the inequahty when 2?^ = 0. In this case fm = 0. If 
< Pm, we have P/^(j("i) = and thus the inequahty is true. Otherwise 
> pm and as, for all u> z ^ xl ni''^) is nondecreasing with z we get, 
by definition of pm, 

P/,<tM = x|/||2/<x2,n(90,n(a)'rV^^) 



< 



xj2^/<x2,n(90,n(a)rVo"^) < Pm- 



Let us now fix some m G 7W„ \ {n} such that 2?^ / and set ^; = ||/ 
nm/lP/c"^- Note that the random variables 

ll/-/mf _ ||/-nm/ + anmgf _ 

and 

||y-/m||2 ||/_n„/ + cj(e-nme)f 



a2 cj2 



are independent and that the second one is distributed as a noncentral x^ 
with noncentrality parameter z and N^ degrees of freedom. Therefore, we 
get 

(21) P/,.M= (l-X^,i..(§--))4iV.(^o,^J«)^)- 

We deduce from the definition of pm that, for all o" G / and z > 0, the right- 
hand side of (21) is not larger than /3m, which leads to the result. 

5.3. Proof of Theorem 3.1. The principle of the proof leading to the lower 
bounds on the r^s is due to Lepski. However, the following nonasymptotic 
inequalities are to our knowledge new. In the sequel we set Nm = n — Vm ■ 
Let us now fix some m G Mn] we divide the proof into consecutive claims. 

Claim 1. If a + /3 < 1 -exp(-l/36), then 



27 
where Ci = -41og(l -a- /3)/81. 



T 



rl>[^-^C{D;^-' 
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Note that the claim is clear when T>rn = 0; we shall thus restrict ourselves 
to the case 2?^ ^ 1- The proof relies on two lemmas. In the first one, we 
show that, under the assumption of Theorem 3.1, with probability close to 
1 the Euclidean distance between / G Sm and its estimator / is not greater 
than rm- 

Lemma 5.1. Let the pair {f,r) satisfy the assumption of Theorem 3.1. 
Then, for all m G M.n, f S Sm and a £ I, 

(22) P/,,[||/-/||>r„]<a + /3. 

Proof. For all f e Sm, 

^fA\\f-f\\>rm] 

<FfA\\f-f\\>rm,rm>r]+Ff,A\\f-f\\>rm,r>rm] 
<Ff,^[\\f-f\\>f]+FfAr>rm] 
and we conclude thanks to (13) and (14). D 

The second lemma shows that such a property of the estimator / is pos- 
sible only if rm is large enough. 

Lemma 5.2. Let S be a linear subspace o/M" of dimension "D > 1 and 
6 a positive number such that 6 <1 — exp[— 'D/36] . If f is an estimator of f 
in (1) which satisfies, for all f G S, 

(23) P^,.[||/-/||>^^(5)]<<5, 

then 

V 



vl{5) > (^- - - ^V\og{l/ {I -5))y\ 

In light of Lemma 5.1, the claim derives from Lemma 5.2 by taking 5 = 
Smi 5 = a + (3 and a = t. Let us now turn to the proof of Lemma 5.2. 

Proof of Lemma 5.2. The Gaussian law being invariant by orthogonal 
transformation, with no loss of generality, we assume that S is the linear 
span generated by ei , . . . , e© , the T> first vectors of the canonical basis of 
R". Moreover, by homogeneity, we assume that a^ = 1. Let v{6) be some 
positive number satisfying 

(24) y^(^s)<^-lV-Vlog{l-S). 



18 
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Note that the right-hand side of (24) is positive for 5 <1 — exp[—T>/36]. We 
prove Lemma 5.2 by showing that, for all estimators / with values in M", 

infP/,i[||/-/f <„2(5)]<1_<5. 

Let ^i, . . . ,^-z)he Rademacher random variables (i.e., P[i^j = ±1] = 1/2) which 
are independent of Y and set /(^) = )^J2i=i^i^i: where A denotes some 
positive number to be chosen later on. Using that 

V \ 



clF 



'''''\y) = ewl-^ + xE^.m 



and the fact that /(.^) E 5, we have 

infP^,i[||/-/||2<^2(^)] 



i=l 



< 



V 



:E, 



0,1 



i=l 



Note that / = f{Y) satisfies 

V V 

E(^?* - hf > A^ E '^{iih{Y) < 0} 



1=1 



i=l 



and thus, setting 



V 



iv(e,/) = A'Eite/.(n<o}, 



i=l 



we derive 



infP;,.[||/-/f <^;2(5)] 



<Eo,i 



l{N{i,f)<v\5)}eM-\^V/2 + \Y^i,Y, 



V 



i=l 



By averaging with respect to (, and using Fubini's theorem we get 



(25) 



mfPf,.[ 
fes ■'' 



ff<v\6)] 



<e-^'^/Xi 



E, 



V 



l{N{^,f)<v\6)}exp[Xj2^,Y, 



i=l 
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E 



V 



l{N{iJ)<v\5)}^M\jZ^iY^ 

\ i=l / 



<¥^[N{iJ)<v\6)]¥.^ 



V 



exp(2A^e 
\ 1=1 



i^i 



V 



■F^[N{C,f)<v\5)]l[cosH2XYi}, 



j=i 



which together with (25) gives 



inf ! 

/G5 



fM 



fr<v\s)] 



(26) 



<e-^'^/%i 



jl/2 



V 



[N{^,f)<v\6)]l[cosh'/\2XY,) 



1=1 



Conditionally on Y, the random variable A^(^, /)/A^ is a sum of P indepen- 
dent random variables with values in {0, 1}. Thus by Hoeffding's inequality 
we obtain that, for all t > 0, 

PjiV(C, /) < EjiV(e, /)] - A^ VPt ] < e-2*. 

Taking t = X^V/2 - log(l - 6) and noting that E5[iV(C, /)] > X^V/2 we get 
from (24) that 



, fT) /a2D2 



>(y-^)^-AV-Plog(l-5) 



and thus, for A = \/2/3, 



Consequently, 



Now using that 



E[Ni^,f)]-XWVt>v'i6) 



^'AN{^,f)<v\6)]<e-' = {l-6)e-''^/\ 



E, 



0,1 



V 



'[[cosh^^\2XY,) 



V 



nEo,i[coshi/2(2Ay,)] 



j=i 



<Ejf [cosh(2Ayi)] 



exp[A^P], 
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we derive from (26) that 

mi¥f,^[\\f-ff<v\5)]<l-6, 

which concludes the proof. D 

Claim 2. If a + 2/3 < 1 - exp(-l/4), then 

(27) 9rl > maxWC2N„^, {N^ - 2^J L^Nm )r]\T'^ . 
with£2 = 21og(l + 4(l-a-2/3)2) and £3 = - log(l - a - 2/3). 

The claim is clear when Nm = 0; thus we only consider the case where 
Nm > 1. Again, the proof relies on two lemmas. The first one shows that if 
the pair (/, f) satisfies the assumptions of Theorem 3.1, then it is possible to 
build a level (a + /3)-test of "/ S Sm" against "/ G M" \ Sm" which achieves 
the power 1 — /3 on the complement of a ball of radius Sr^- Namely, the 
following holds: 

Lemma 5.3. Let {f,r) be a pair of random variables with values in M" x 
]R+ satisfying the assumptions of Theorem 3.1. The test of hypothesis "/ G 
Sm" against the alternative "/ ^ Sm" associated with the critical region 

(28) n = {r>rrn}U{\\f-Umf\\>2f} 
has the following properties: for all a £ I , 

(29) sup P/^[7^]<a + /3, 

feSm 

and for all f satisfying \\f - Umf\\ > 3rm, 

(30) P/,,[7^]>l-/3. 

Proof. Let us show (29). First note that, for ah / G Sm, 

ll/-n^/||<||/-/|| + ||/-n^/|| 

<2||/-/||. 
By (13), (14) and (31), for aU / G S^ we have 

+ P/,,[||/-n„/||>2r1 
<a + P/,,[2||/-/||>2f]<a + /3. 
Let us now show (30). Let / G M" be such that ||/ — Ilm./ll > Srm- Since 

||/-n^/||>||/-n„/||-||/-/||>3r„-||/-/||, 
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we derive that 

P/,,[7^^=P/,,[||/-^^/||<2f,f<r^] 

< P/,.[||/-/||>r„, r^>r1 
<F/,.[||/-/||>r]</?. D 

We obtain the claim by proving that a test having the properties described 
in the previous lemma exists only if r-m is large enough. The inequality 

derives from Baraud [(2002), Proposition 1]. For the second inequality, 



9r^>(iVm-2v/>C3iVm)??r2, 
we use the following lemma. 

Lemma 5.4. Let S he a linear subspace ofW^^ with dim(S') =T> (we set 
N = n — V) and 5 and (3 he numbers satisfying 0</3 + (5<l — exp(— A^/4) . 
Let (j)iY) be a test function with values in {0, 1} satisfying, for all a ^ L , 

(32) supP/,,[</.(y) = l]<5, 

and for all / G M" such that \\f - Iisf\? > A(Af, /3), 

(33) P/,,[(/.(y) = l]>l-/3. 
Then 

A{N, /3)>{N- 2V-iVlog(l-/3-5))??r^ 

By applying this lemma with 6 = a + (3, S = Sm and V = V„i and the test 
described in Lemma 5.3 we obtain the claim. 

Proof of Lemma 5.4. Let JF be the set defined by 

.F = {/GMM|n5x/f >A}, 

where A denotes some positive number. To obtain the desired result it is 
enough to show that, for 

A < (TV - 2V-iVlog(l-;3-(^))r/r2, 

we have 

(34) inf infP^,.[</<(y) = l]<l-/3. 
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Since the quantity a* = y/1 — tjt belongs to /, we have that, for ah vectors 

inf infP^,.[</.(y) = l] 

< Fz,^A<i){Y) = l]l{\\Us±Zf > A} + l{\\Us±Zf < A}. 

By taking Z as a random variable independent of Y distributed as y/rJTe, 
we obtain by averaging with respect to Z that 

inf inf P^,.[<A(y) = 1]<E[¥ z,.Acl>(Y) = l]]+F[\\Us^Zf < A]. 

erg/ f^J^ 

For the first term of the right-hand side of this inequality, note that Ep^^o-,] = 
Po,T- As E 5 and r G /, we have 

E[Fz,a,mY) = l]]<6. 

For the second term, note that our upper bound on A ensures that 

A<go,7v(l-/5-%r2 

by using the lower bound on the quantiles of x^ random variables (19). As 
the random variable ||n_5±Z|p/(r/r^) is distributed as a x^i^), we get 

F[\\Us±Zf<A]<l-p-d, 

which concludes the proof. D 

Conclusion. By gathering the inequalities of the two claims we get that, 
for some constant C depending on a and (3 only, 

r^ > C max{Nmr],T^m, \/N^}t'^ ■ 

Let us now prove (16). Let us fix some / G M". When / = 0, the result 
is clear by taking 5*^ = {0}. Then we deduce the result for general / by 
arguing as follows. Let us consider the random variables f* = f{Y + f) + f 
and f^ = f{Y + /). For all g G M" and cr G /, we have that 

Fg^,[g G fi(A,f.)] =P,+/,,b + / G B{f,r)] > 1 - /3. 

Consequently, the pair of random variables {f*,f^:) satisfies (13) and thus, 
by taking r,,(Q,0) =r{a,f) we derive that 

r{a, f) = r^,(a,0) > Cmax{r/n, ^/n}T'^. 
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5.4. Proof of Propositions 2.1 and 3.1. The result of the former propo- 
sition being a consequence of the latter by taking rj = 0, we only prove 
Proposition 3.1. In the sequel we set L^ = log(l//?m) and La = log(l/a). 
We distinguish three cases. 

Case m = n. We derive, from (18), 

p2 <(n+2v/^+2L„)r2, 
which leads to the result. 

Case P„^ 7^ 0, m^n. Let us fix a G /. Since for z satisfying 

X2,Ar„(9o,7V™(a)-rVc^^) < l^m 
we have 

(35) z + qo^vA— -, "". X 2/ 2\ ^A =-P°> 

Vx^,7v„(9o,iV„(a)T7a^) / 

we bound from above the left-hand side of (35) for those z satisfying 

(36) xl,N^ iQO,Nm («)-^Vo-^) > /3m- 

It follows from (19) that if z satisfies (36), then 

r2 



9o,7V™(a)— >z + N,n- 2^{2z + N.m)L„ 



and as we have 



2y'(2z + N^)Lm < 2^2zL^ + 2^N^L^ <- + 2^NmLm + 4L^ 



and 



go,^„. (a) < iVm + 2y/NmLa + 2L, 
from (18), we deduce that z satisfies 



2 



za^ < f 2Uo,7V,„(a)^ - Nmj + A^/N^JZ + SLmja^ 
< (2N^T] + i^/N^i ^/iZ + ^/l2 )+8L^ + 4L«) t 

Thanks to (18) and the facts that xl,Nrr^(^o,N„^{a)T'^ /<^'^) < 1 and Vm < 
Nm, we deduce that, for those z, 

2 , I Prn . .\ 2 

ZO- + qQ,Vm — 7 1 \ 21 2\ -^ ^ P 



< [2Nmr] + Vm + 2^/N^{3^/L^ + 2^1^) + 2(5L^ + 2L«) )r 



2 
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and, consequently, that 

pI < {2N„,ri + V„, + 2^nZ{2,.J1:^ + 2^T^) + 2(5L™ + 2L„))t2. 
The result follows as N^ < n. 

Case Vm = 0. Arguing as above we have that for x satisfying 
we have that, for all a & I, 

xl/a2^^{qO,n{a)T'^/(y'^) < (3m 

and therefore, by definition of pm, 

which leads to the result. 
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